Use boolean allocation options #1400

wks · 2025-10-09T13:13:13Z

We remove OnAllocationFail and add boolean fields to AllocationOptions:

allow_overcommit: whether we allow overcommit
at_safepoint: whether this allocation is at a safepoint
allow_oom_call: whether to call Collection::out_of_memory

Now Space::acquire always polls before trying to get new pages. Particularly, when allow_overcommit == true, polling and over-committing will happen in one allocation attempt. If we also set at_safepoint == false, the current mutator will be able to allocate normally in this allocation, but block for GC at the nearest safepoint. This is useful for certain VMs.

We remove OnAllocationFail and add three boolean fields to AllocationOptions.

wks · 2025-10-13T10:35:42Z

After this PR, the decision tree becomes: (assuming this is a mutator thread, and GC is already initialized)

Does the allocation option allow eager polling
- If yes, poll, and move on.
- If no, just move on.
Is any of the following true? (a) The poll above didn't trigger GC (Consider it "not triggered" if we skipped polling), or (2) the allocation option allows over-commit
- If yes, try to get pages from the page resource, and move on.
- If no, just move on.
Have we got pages from the page resource?
- If yes, do the mmapping and return the address.
- If no, are we at safepoint?
  - If yes, then
    - Have we tried getting new pages from the page resource?
      - If yes, force a GC, and move on.
      - If no, just move on.
    - block for GC.
    - return NULL.
  - If no, then return NULL immediately.

The control flow is more linear than before, with three steps, each using one boolean option.

By combining the three options, we can replicate the behaviors of the previous OnAllocationFail variants.

Variant	`eager_polling`	`allow_overcommit`	`at_safepoint`
`RequestGC`	true	false	true
`ReturnFailure`	true	false	false
`OverCommit`	false	true	false

We can make a new combination to poll (scheduling GC in the background) and overcommit at the same time, and postpone blocking for GC to the next safepoint. This can be useful for VMs where allocation never happens at safepoints.

new behavior	`eager_polling`	`allow_overcommit`	`at_safepoint`
both poll and overcommit	true	true	false

But I wonder whether we can remove the eager_polling option (i.e. always making it true). I can't think of any use cases where we don't want to poll. Polling only affects GC threads in the background. Even if GC is not initialized at this time, GC workers will be able to start the first GC immediately after GC is initialized. So it seems to be harmless to let it poll all the time.

wks · 2025-10-15T02:51:53Z

We discussed in today's meeting. We should either

remove eager_polling (making the first polling compulsory, unless it is not a mutator or GC is not enabled), or
allow disabling the second polling (the one after trying to get pages from pr and failing), too.

I am in favor for removing eager_polling. I added eager_polling for the purpose of letting the user replicating the old OnAllocationFailure::OverCommit behavior. But we think we can change behavior as long as it is more reasonable. As I mentioned before, the GC is scheduled in the background without affecting the execution of the mutator, and I can't think of any reason why the mutator would try not to trigger GC. If it needs "critical section" semantics, we have a separate issue discussing this: #1398

The only thing it may affect may be NoGC. NoGC panics immediately in NoGC::schedule_collection. So the eager polling has a potential to let the process panic earlier than before, if allowing over-committing. I think this is actually reasonable because otherwise the program will simply ignore the heap size if it always uses over-committing.

qinsoon

Other than #1400 (comment), this PR looks good to me.

qinsoon · 2025-10-15T02:38:55Z

src/vm/tests/mock_tests/mock_test_allocate_overcommit.rs

@@ -1,6 +1,8 @@
+// GITHUB-CI: MMTK_PLAN=NoGC


Why is this test only for NoGC?

Well, it doesn't have to. The test is about the behavior of allocation after exceeding the heap size, and it is not about the GC. So it doesn't really matter which plan it is. But I changed it to "all" just in case any plan triggers GC differently (mainly ConcurrentImmix).

Now polling cannot be disabled. It will always poll.

wks · 2025-10-15T05:55:20Z

I removed eager_polling and added migration guide.

qinsoon

The PR looks good to me.

There are just some minor points about the documentation -- I think it currently over-specifies the behavior of the new boolean flags. It’s good to be specific and precise, but the docs shouldn’t expose internal implementation details or define behaviors that are not controlled by the flag.

qinsoon · 2025-10-15T23:35:42Z

src/util/alloc/allocator.rs

+    /// Whether over-committing is allowed at this allocation site.
+    ///
+    /// **The default is `false`**.
+    ///
+    /// If `true`, the allocation will still try to acquire pages from page resources even when a GC
+    /// is triggered by the polling.
+    ///
+    /// If `false` the allocation will not try to get pages from page resource as long as GC is
+    /// triggered.
+    ///
+    /// Note that MMTk lets the GC trigger poll before trying to acquire pages from the page
+    /// resource.  This gives the GC trigger a chance to trigger GC if needed.  `allow_overcommit`
+    /// does not disable polling, but only controls whether to try acquiring pages when GC is
+    /// triggered.
+    pub allow_overcommit: bool,


I think the doc is way more detailed than what is necessary.

None of these is related with 'over commit':

MMTk will acquire pages

MMTk uses page resources

'Overcommit' only means one thing: MMTk may go beyond the specified heap size, in order to satisfy this allocation request. Additionally, MMTk still triggers GC when it overcommits memory.

Yes. I'll simplify the description and make the point that "when over-committing, it may allocate beyond the heap size".

qinsoon · 2025-10-15T23:38:12Z

src/util/alloc/allocator.rs

+    /// Whether the allocation is at a safepoint.
+    ///
+    /// **The default is `true`**.
+    ///
+    /// If `true`, the allocation is allowed to block for GC, and call [`Collection::out_of_memory`]
+    /// when out of memory.  Specifically, it may block for GC if any of the following happens:
+    ///
+    /// -   The GC trigger polled and triggered a GC before the allocation tries to get more pages
+    ///     from the page resource, and the allocation does not allow over-committing.
+    /// -   The allocation tried to get more pages from the page resource, but failed.  In this
+    ///     case, it will force a GC.
+    ///
+    /// If `false`, the allocation will immediately return a null address if the allocation cannot
+    /// be satisfied without a GC.  It will never block for GC, never force a GC, and never call
+    /// [`Collection::out_of_memory`].  Note that the VM can always force a GC by calling
+    /// [`crate::MMTK::handle_user_collection_request`] with the argument `force` being `true`.
+    pub at_safepoint: bool,


Same here. at_safepoint means MMTk may block the thread in this allocation request for GC. These are not related with at_safepoint:

MMTk may call out_of_memory. MMTk calls out_of_memory when it runs out of memory, which is unrelated with at_safepoint. Before we have a specific flag like allow_oom_call, there is no definition when MMTk may call out_of_memory -- it is an implementation detail.

The reasons why MMTk may block for GC are implementation details.

handle_user_collection_request is unrelated. It is a separate API, and is not related with alloc_with_options.

We still have a method allow_oom_call. It previously returns true only for OnAllocationFail::RequestGC. Maybe we should add another option AllocationOptions::allow_oom_call for that.

handle_user_collection_request is relevant because if alloc_with_option(at_safepoint=false) cannot force a GC, and it can't satisfy the allocation request without a GC, the user would trigger a GC manually instead. Currently, it mimics the behavior of OnAllocationFail::RequestGC and it forces a GC when it is at safepoint, and GC is initialized.

Alternatively, we can make "forcing GC" (i.e. the second poll() invocation with space_full = true) unstoppable, too. That is, as long as we fail to get pages from the page resource, it will force a GC. But it may return null if it is not at a safepoint. This will slightly change the control flow. One concern is that what will happen if GC is not initialized, and we failed to get pages from the PR, and it is not at safepoint? Should it panic immediately, or should it return null? @qinsoon what do you think of this?

My opinions on this are:

Binding users do not need to know the 'forced GC'. There is little point mentioning it.

at_safepoint only makes sure the thread will not be blocked in this call for GC. It does not do anything more than that.

Whether we 'force' GC after a failed allocation is an implementation detail.

I personally think we probably still want to 'force' the GC after a failed allocation, but do not block the current thread. In this case, at_safepoint only changes the behavior of blocking, and does not change the behavior of GC triggering.

One concern is that what will happen if GC is not initialized, and we failed to get pages from the PR, and it is not at safepoint? Should it panic immediately, or should it return null?

With at_safepoint=true, the behavior is panic. We could keep that behavior with at_safepoint=false. at_safepoint doesn't need to change that behavior.

I updated the documentation so that at_safepoint no longer guarantees anything other than not blocking for GC.

I also made allow_oom_call another option. But the current behavior of allow_oom_call is quite inconsistent. It only controls Space::handle_obvious_oom_request and LockFreeImmortalSpace::acquire, but not Allocator::alloc_slow_inline or util::memory::handle_oom_error both of which call Collection::out_of_memory. I didn't change the current behavior, and I am leaving it to another pull request.

It should be "at_safepoint" instead of "allow_oom_call".

wks mentioned this pull request Oct 10, 2025

Reorganize Space::acquire #1381

Closed

wks added 2 commits October 13, 2025 17:54

Use boolean allocation options

8149ec3

We remove OnAllocationFail and add three boolean fields to AllocationOptions.

Enable overcommit test

fa47af7

wks force-pushed the feature/overcommit-still-triggers-gc3 branch from 59ff447 to fa47af7 Compare October 13, 2025 10:04

Formatting

6f86366

Holder

3b13237

This was referenced Oct 14, 2025

Allow requesting GC while over-committing. #1380

Closed

Change the semantics of OverCommit to allow polling #1382

Closed

wks marked this pull request as ready for review October 14, 2025 07:54

wks requested a review from qinsoon October 14, 2025 07:54

qinsoon reviewed Oct 15, 2025

View reviewed changes

wks added 3 commits October 15, 2025 12:51

Run overcommit test with all plans.

655288d

Remove eager_polling option

9f65d4c

Now polling cannot be disabled. It will always poll.

Migration guide

c0ba0c7

wks mentioned this pull request Oct 15, 2025

"poll" is not always polling, and "collection_required" is not really a question #1402

Open

qinsoon approved these changes Oct 16, 2025

View reviewed changes

wks added 6 commits October 16, 2025 10:22

Simplify the description of allow_overcommit.

f50dd21

Simplify the description of at_safepoint.

34d29c4

Change the condition to return early from alloc_slow

226b12c

It should be "at_safepoint" instead of "allow_oom_call".

Add allow_oom_call as an option

9d7d4ac

Update migration guide

a62aca9

Split test cases

3a32436

wks added this pull request to the merge queue Oct 17, 2025

Merged via the queue into mmtk:master with commit 1ffa5b3 Oct 17, 2025
31 of 32 checks passed

wks deleted the feature/overcommit-still-triggers-gc3 branch October 17, 2025 02:34

wks mentioned this pull request Oct 17, 2025

Refactor barrier implementations mmtk/mmtk-openjdk#332

Open

Use boolean allocation options #1400

Use boolean allocation options #1400

Uh oh!

Conversation

wks commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wks commented Oct 13, 2025

Uh oh!

wks commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinsoon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wks commented Oct 15, 2025

Uh oh!

qinsoon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wks commented Oct 9, 2025 •

edited

Loading

wks commented Oct 15, 2025 •

edited

Loading